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LINEAR ESTIMATION AND RELATED TOPICS 


INTRODUCTION 

The purpose of this paper is to present a collection of theorems, definitions, 
and other facts about the multivariate Gaussian distributions, to give desirable 
properties of linear estimators, and to examine some statistical techniques of 
linear estimation which are now used in the differential correction of orbits of 
near earth satellites. The basic concepts of classical orbit correction will be 
presented in the first part of the paper to show where the need arises for esti- 
mating such quantities as orbital parameters, geodetic parameters, etc. The 
motivation for writing this paper has arisen from the necessity of determining 
satellite orbits from tracking station data which is known to be contaminated 
with noise. 

The different estimation techniques to be discussed are (1) unweighted least 
squares, (2) conventional weighted least squares, (3) maximum-likelihood, 

(4) Bayes, and (5) Kalman-Schmidt filter. All of these techniques except Kalman- 
Schmidt utilize all the observations simultaneously in arriving at an estimate for 
the orbital parameters. The Kalman-Schmidt filter permits a complete optimum 
estimate of the orbital parameters from each single observation and hence oper- 
ates sequentially on the observations one at a time in the order of their occurrence. 
It has been shown reference [15] that the Kalman-Schmidt technique is equivalent 
to a linear least squares method and that the covariance matrices of the orbit 
parameters are the same. A comparative study of the different techniques based 
on similar notations will aid the astrodynamicist in selecting among them. 

Conventional Differential Orbit Correction 


Every observed or measured quantity contains errors of unknown magnitude 
due to a variety of causes, and hence a measurement is never exact. The result- 
ant error in a given quantity is the difference between the measurement and its 
true value. For a single quantity which has been determined by observation, 
neither the resultant error nor any of its individual parts can ever be determined 
exactly, but can be fixed within certain probable limits. 

One may describe a differential correction procedure as a systematic method 
for using the residuals to improve the values of a set of orbital parameters until 
the "best" set has been obtained. The differences between the actual measure- 
ments or observations and the computed values of the quantities observed as a 


1 



function of the state vector x (t 0 ) are called the residuals. The basic concepts 
of a differential correction procedure can be illustrated as follows: 

Let y i be an observation or measurement of some quantity such as right 
ascension or declination at time t = t L and let e i be the associated observa- 
tional error. Then there exists a function Fj (x (t 0 ); ti ) of the unknown state 
vector x (t 0 ) such that 


Yi =F i (x(t 0 ) ; t . ) + e.. 

Let y ( x (t Q ); t .) be the computed value of the quantity observed at time 
t = t based upon an estimated state vector x (t Q ) at time t = t 0 . If we assume 
that x (t Q ) is sufficiently close to x (t Q ) so that squares and higher powers of 
8x (t 0 ) = x(t 0 ) - x (t 0 ) can be neglected, then Fi (x(t 0 ); t A ) may be expanded 
in a Taylor series to a first order approximation about the estimated state vector 
x (t Q ), i.e. 


Yi - ^ (x(t 0 ); t.) + 


V i 


/ 


—TT) (x. - X.) + e. , (i = 1, 2, . . . , n). (1) 


Way.,, 


If we assume that the functions F A (x(t 0 ); ) can be replaced by the known 

functions y £ (x(t 0 ); t. ) equations (1) become 


y. ~ y. ( x (t Q ); t.) + 



, n). 


( 2 ) 
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In matrix form the n linear equations (2) become 


Sy(t) = H(t) 8x(t 0 ) + e 


( 3 ) 


where 



s yi( t i> 


y^tj) - y x (x(t 0 ) ; tj) 


e i 

s y(t) =. 

s y 2 ( t 2 > 

1 = 

y 2 (t 2 ) - y 2 (x(t 0 ) ; t 2 ) 

. e = 

e 2 


^(tn) 


y n ( t n> -y^^o) ; *„> 
L j 

nx 1 

e 

n 




. Sx(t Q ) = x(t Q ) -x(t Q ) = 






X 


p 


— Jpx 1 


The unknown state vector x (t 0 ) is the variable which is to be found by solving 
equation (3). Assume for the moment that 'ey (t) is not affected by observational 
errors but only by errors in the initial state vector x (t Q ). It is known from the 
theory of linear equations for the case where n = p , that a necessary and suffi- 
cient condition that equation (3) has a unique solution is that the determinant of 
the matrix H (t) does not vanish. If the observations were not affected by errors 
an exact solution to equation (3) would be obtained. The presence of accidental 
errors in Sy (t ), however, prevents the true value of x (t Q ) from being deter- 
mined and hence only approximate values can be found. By increasing the num- 
ber of observations the effect of the observational errors can be diminished. 
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If n > p, i.e., the number of observations is greater than the number of 
unknowns, then the resulting linear system of equations is overdetermined and a 
least square criterion is applied to fit the trajectory to the observations. In 
practice a weighting matrix or diagonal matrix W is applied to equation (3). If 
the observation errors are included in the matrix of residuals equation (3) becomes 


WH Sx = W Sy 

(WH) t (WH) Sx = (WH) t W Sy 


( 5 ) 


Sx = ((WH) t (WH))- 1 (WH) t W Sy 

Sx = (H t (W t W) H)” 1 H t (W t W) Sy 

Sx = (H t W 2 H)" 1 H t W 2 Sy . (6) 


The solution of the normal equation (5) provides Sx (t Q ) given by equation (6), 
and the original estimated state vector x (t 0 ) is improved to give a new estimate 
of x 

%*,<*•> + 8 x(t o>- 

This solution is a least squares estimate of the correct solution based on n 
observations. In practice an iterative procedure is employed on the new state 
x(t Q ) to obtain a further improved state vector using as a test for convergence 
"some statistical measure of the magnitude of the residuals. For example, at the 
Smithsonian Institution Astrophysical Observatory, the iteration is assumed to 
converge if the change of the standard deviation a on two successive iterations 
differs by less than 1 part in 100, or 


cr. 

1 

Multivariate Gaussian Distributions 

The scientific investigator often knows or is willing to assume that the 
population from which he takes observations is of a certain functional form. 
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Usually one knows very little about the distribution of errors in the data and the 
assumption is generally made that the errors are normal. 

The smoothing or estimation- problem can be described as follows: One has 
a set of data which is known to be the sum of desired information and some ran- 
dom errors. The problem is to extract the desired information. A concrete 
form of this is given by equation (3) where the state vector x(t) is to be deter- 
mined. In the statistical literature this is referred to as "regression analysis" 
or "linear analysis." 

The question of choosing a criterion for smoothing and estimation has a long 
history. In 1799 La Place encountered the question and proposed the use of the 
L,, norm. Since one could not perform any computation with this norm, Legendre 
proposed the L 2 or least squares norm in 1805. Since that time there has been a 
considerable amount of controversy about this question. The principle of least 
squares states that the best estimate x of x is that number which minimizes the 
sum of the squares of the deviations of the measurements from their estimate. 
The principle of least squares is normally defended on the basis of the assump- 
tion that the errors are normally distributed. It is undoubtedly true that the L 2 
norm is efficient in such a situation, probably the most efficient possible. 

Since the normal distribution is used very frequently, it is felt that some 
background information in the way of theorms and definitions may be of value in 
discussing the different estimation techniques. 

Let Y = {y lS y 2 , . . . , y p } be a p- dimensional random vector with mean 
vector fj. = EY = , /x 2 , . . . , p } . 

Definition 1 . The random vector Y is normally distributed in p dimensions 
if the joint density of y x , y 2 , • • •, y p is 

f (Y) = f(y r y 2 , • • • , y ) =-^ e'" 2 <^> T *(y-m) (1) 

(2tt)p /2 


(-00 < y. < 00, i = 1, 2, • • • , p) 


where R is a positive definite symmetric matrix whose elements r.^ are con- 
stants, ^ is a p x l vector whose elements /x. are constants. We observe that 
for the special case p = 1 that R = r n which must be positive. If we set r n = 
l/cr 2 , it is clear that the frequency or density function f (Y) is the normal 
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distribution in a single variable. The quantity S = (Y - /x) T R(Y- y a)is called 
the quadratic form of the p -variate normal. It is a quadratic form in the ele- 
ments y . - /x. . It can be written as 


S = 



(y i - V (y } -^) r... 


( 2 ) 


In order for f (Y) to qualify as a density function it is essential that f (Y) - 0. 
This is clear from the fact that the determinant of a positive definite matrix is 
positive. It is also necessary that the integral of f (Y) be equal to 1. A conse- 
quence of this fact is 


J pOO *CC 

' • • • I e -i/2(Y-#t) r ( y ~m) dy. dy, • • • dy = — • 

J-CC 1 2 P iRl " 2 


Definition 2. The expected value of a matrix or vector A which is written 
E (A) is defined as the expected value of each element of A. For example, if 


A = 


a n a i2 


a 21 a 22 


E(A) = 


'E(a n ) E(a, 2 ) 
E ( a 2 1 > E ( a 22> 


Definition 3. Let the p x 1 random vector Y be distributed as the p -variate 
normal; then E(Y) = /x. 

Definition 4 . The variance of the random variable y. = E[y. - E(y .)] 2 = 
E[y. - /x. ] 2 , and the covariance of the two random variables y £ and y. is 
E(y. - /lx.) (y. - /j..) , i / j. In a p-variate normal there will be p variances 

one for each random variable y., and p(p - l)/2 covariances. A p x p matrix 
has the covariance of y. and y. as its ij th element if i / j , and the variance 
of y i as its i th diagonal element. Hence 
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f e 


11 

°12 * 

• 

21 

a 22 * * 

• % 


V 


cr a „ 

pp p 2 


a 


pp 


/ 


where cr. . .= cr . . = E( y. - /x. ) ( y j - /lx. ) or in matrix notation V = E(Y - /x) 

( Y - /lx ) T . V is called the covariance matrix of the vector Y. 

Theorem 5 . In the p-variate normal the matrix R is the inverse of the 
covariance matrix V, i.e., V -1 = R or also R _1 = V. As a consequence of this the 
p -variate normal can be written as 

1 e -l/2(Y -/x) T v"l (Y-/x) _ 

|v| 1/2 (277)p /2 


If the covariance matrix V is a diagonal matrix then the two random 
variables y. and y. are independent. Clearly the non diagonal elements 
are 0 if and only if the correlation between the two random variables y. 
and y. is 0, since the correlation is defined as 


A J / 

V cr. . cr. . 

11 J J 

Theorem 6 . Let the vector Y have a p-variate normal density with mean/x 
and covariance matrix V, and let a, , a be a set of constants. Then 

1 2 p 

Z = 2 a. y. is distributed as the univariate normal with mean 2 a. ll . and 

i=l 1*1 1 ^1 

variance 

Z] a — + Z] 

i 

i 

Example. Let the 3 x l vector Y have a p-variate normal density, where 



M = 




0 3 

1 0 
0 5 
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The covariance matrix is 

fs 0 

R" 1 = V = 0 1 

V - 3 0 

HereE(y 1 ) — fj . j — 3, E(y 2 ) — ~1> E(yg) — ^3 0. ~ ^12 ^21 — ^ 1 

crj 3 = ct 3 i = -3, o- 22 = 1, a 23 = cr 32 = 0, ct 33 = 2. We say that the mean of y 2 
is -1 and the variance of y 2 is 1. If Z = y t -3y 2 +2y 3 with = 1 , a 2 = -3, 
a 3 = 2, then 



E(Z) = 1, E(yj ) - 3E(y 2 ) + 2E(y 3 ) = 1, 3 - (3) ( - 1) + 2(0) = 6 


3 

var(Z) = a i a i + 

i=l 



a. a. a. . 

1 ) 1 j 


= a i a i + a 2 CT 2 + a 3 CT 3 2 + 2 [a i a 2 CT X2 + a i a 3°13 + a 2 a 3°23 ] 


= (l) 2 (5) + (- 3) 2 (1) + (2) 2 (2) + 2 [(1) (-3) (0) + (1) (2) (— 3) + (-3) (2) (0) ] 


var (Z) = 5 + 9 + 8 + 2 [0-6+0] = 22-12 = 10 


Theorem 7 . If a vector Y is distributed with mean 0 and covariance matrix 
a 2 1, the expected value of the quadratic form Y T AY is equal to a 2 tr (A). 


DESIRABLE PROPERTIES OF LINEAR ESTIMATORS 


It is perhaps appropriate to review some important desirable properties 
of estimators: (1) sufficiency, (2) unbiasedness, (3) consistency, (4) efficiency, 

(5) minimum variance, (6) completeness, and (7) invariance under transformation. 


Sufficiency. Let f(y x , y 2 , . . . ,y n ; 6 X , 6 2 „ 
function involving K parameters. The statistics 6 X 

= h 2 (Vi- y 2 » — yJ> - • K = K(yi> y 2 > - • • 


, 0 ) be a joint frequency 

= \ (y v y 2 >* • • ’ y n 

, y n ) are a set of sufficient 


8 



statistics if g( y t » y 2 » • • • * y I 9 V & 2 » * • • » 6 m H s the conditional density func- 
tion of the observations given the statistics. The original observations y , y 2 , 

. . . , y n always form a set of sufficient statistics but generally what is wanted 
is a minimal sufficient set. 

Unbiasedness. An estimate 8 is said to be an unbiased estimator of 9 if 
E(0) =9. 

Consistency. An estimator & n (A sequence of estimators {0 n > ) is said to 
be consistent for 8 if the limit of the probability that 6 n - 9 = 0 is 1 as n -* co . 

6 is the estimate of 9 based on a sample of size n. 

Efficiency. An estimator 9 n (A sequence of estimators { 9 n } ) is said to be 
efficient when the following two conditions are satisfied: (1) /n ( 9 n - 6) is 
asymptotically normally distributed with mean 0 and variance a 2 where n is the 
sample size, (2) the variance cr 2 is less than the variance of any other estimator 
6* that satisfies condition 1. 

n 

Minimum variance. An estimator 9 is said to be a minimum variance 
estimator of 9 if 

E [8 -E(0)] 2 ^E [9* -E (9*)] 2 

where 9* is any other estimator for 6 . If an estimator is efficient, then it is 
consistent and unbiased in the limit but need not be unbiased for finite sample 
sizes. An unbiased estimator is not necessarily consistent. 

Completeness. An estimator 9 is complete if there exists no unbiased 
estimator of zero in the frequency function f( 6 ; 8) (except zero itself). 

Invariance. An estimator 6 of 9 is said to be an invariant estimator for 
a certain class of transformations g if the estimator is g {9) when the transform- 
ation changes the parameter to g (9). 

The mean-squared error can be written as 

E[(0 - 8) 2 ] = E [{8 - E(9)} - {9 -E(0)}] 2 = var (9) + [8 - E (8)J 2 . 

The term 9 - E (9) is called the bias of the estimator and can be either 
positive, negative, or zero. If an estimator can be found with bias close to zero 
and such that var (8 ) is small, the mean squared error will be small. Thus it 
seems that it may be desirable to have an estimator whose bias is 0. If we con- 
sider only estimators which are unbiased, then E (9) = 9, and the mean-squared 
error of an unbiased estimator is equal to the variance of the estimator. Thus 

A A A 

the mean squared error becomes E [ ( 6 - 9 ) 2 ] = var ( 9) if 9 is unbiased. 
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Definition 8 - Minimum- variance unbiased estimator. Let y x , y 2 ,. • • , y n 
be a random sample from F(y; 0). Let d = d(y x , y 2 , • • • , y n ) be an estimator 
of 0 such that (a) E(0) = 9 ; that is, 9 is unbiased (b) var ( Q ) is less than the 
variance of any other unbiased estimator. Then 9 is the mini mum- variance 
unbiased estimator of 9 . 

Definition 9 - Maximum Likelihood Estimators. The likelihood function of 
n random variables y x , y 2 , • • • , y n is the joint density of the n random vari- 
ables L (y x , y 2 , • • • , y n ; & 1 , 0 2 , • • • 9 K ) which is considered to be a function of 
the k parameters. In particular, if y x , y 2 , • • • , y n is a random sample from 
the density f (y x » y 2 , • • • , y ; & 1 , 0 2 ,•••>' 0 K ) then the likelihood function is 

n 

L(6 V 0 2 6 K ) = 1~[ f(y.; 9 V d y dj. 

i = l 

Definition 10 - Maximum Likelihood Estimator. If the likelihood function 
contains k parameters, i.e., 


n 

L(< 9 j, e r , & K ) = "j - [" f (y i ; e v e 2 , ... , e K ) 

i = l 

then the maximum likelihood estimators of the parameters 0 x , & 2 , • • •, are 
the random variables 6 1 = d x (y x , ^y 2 , • • , y n )[ 6 2 = d 2 ( yi , y 2 , • • • , y n ); • • • , 

<9 k = d K (y x , y 2 , • • • , y n ), where 6 , $ 2 , . . . , 0 K are the values in the parameter 

space which maximize L(<9 1( d 2 0 K ). If certain regularity conditions are 

satisfied, the point where the likelihood is a maximum is a solution of the 
equations 


3L 


0V ‘ *. 5 k>=° 


3L 

3 ^ 


(0 V 0 2 , 0 K ) = 0 


3L 

3 9 „ 




e„) = o. 
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Theorem 11 . The maximum likelihood estimators 6 X , . . . , & K for the 

parameters of a density f ( y 1 , y 2 , • • • , y n ; 6 , 6 V • . • , d K ) from samples of 
size n are, for large samples, approximately distributed by the multivariate 
normal distribution with means 0 1 , 6 6 and with matrix n R in the 

quadratic form, where 


r . 

* j 


B 2 


3 9 . 9 


log f(y r y 2 , 


J x , 6 r , t> K ) 


The variances and covariances of the estimators are (l/n) V, where v = R -1 . 


Definition 12. The model Y = $ 6 + e where Y is a random observed vector, 
e is a random vector, $ is an n x p matrix of known fixed quantities, 6 is a 
p x l vector of unknown parameters is called a general linear model of full rank, 
if the rank of $ is equal to p where p £ n . 

Theorem 13 . If the general linear hypothesis model of full rank Y = O 9 + e 
is such that the conditions on the random vector are 


E(e) = 0 
E(e e T ) □ cr 2 1 


the best (minimum variance) linear unbiased estimate of the vector Q is given 
by least squares; i.e. 9 = ($ T 0)~ 1 0 T Y is the best linear unbiased estimate of 
6 . This estimate is the same as the ma x im um likelihood estimate under normal 
theory. This theorem is called the Gauss - Markhoff theorem. 

Under certain quite general conditions the maximum likelihood method gives 
rise to estimators that are consistent, efficient, and sufficient. 

If the method of maximum likelihood is to be used, the form of the frequency 
function must be known. The maximum likelihood estimators also possess a 
property which is called invariance, i.e., if 6 = d(yj, y 2 , • • • , y n ) is the maxi- 
mum likelihood estimator of 9 in the density f (y; 9) and u ( 9) is a function having 
a single valued inverse, then the maximum likelihood estimator of u (9) is u(i9). 

The maximum likelihood estimator is not always unbiased, but frequently it 
can be changed so that it becomes unbiased. For example 
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v 2 =— y (Xj - x) 2 

n / -j 1 

is a biased estimator of a 2 but n/(n - 1) a 2 is unbiased. 


LEAST SQUARES ESTIMATORS 

In the method of least squares the form of the frequency function need not 
be known, but the method can be used if it is known. In many important cases, 
even if the form of the frequency function is unknown, the method of least squares 
gives rise to estimators that are unbiased and consistent and under certain con- 
ditions minimum variance unbiased. If the observations are uncorrelated and of 
equal weight the method is simply called unweighted least squares. If the assump- 
tion is made that the observations are uncorrelated and weighted unequally, the 
method will be referred to as weighted least squares. Generalized least squares 
is the treatment which assumes the observations to be correlated and to have the 
general multivariate normal distribution. In the case of weighted least squares 
the weighting matrix is a diagonal n x n matrix W whose elements are the 
weights W. . 

First we will derive the estimates for the case of weighted least squares. 

The estimates for unweighted least squares are then easily obtained by taking 
the weighting matrix to be the identity matrix. Consider 


8y(t) =H(t) Sx(t 0 ) + e 

and assume E[e] =0, E[ee T ]= cr 2 W _1 . Let the least squares value of Sx , 
i.e., the value of Sx{t 0 ) as obtained from the normal equations, be denoted by 
8x (t 0 ). Let/W be a diagonal matrix of order n the elements of the diagonal 
being the square root of the weights W . In the case of equal weighting, /W is 
the unit or identity matrix. Let the column matrix /W e of residuals be defined 
thus: 


/W e = /W (8y - H Sx) 

The sum of the squares of the residuals, e T We, may be written 


e T We = (Sy - HSx) T W (8y _ H Sx) 


12 



where 


W = v/w = /w /w . 


Hence 

e T We = [ (S y) T - (Sx) T H T ] W(Sy-HSx) 


= (Sy) T W(Sy)-(Sy) T WH(Sx)-(8x) T H T W$y + (Sx) T H T WH (Sx) . 


By the method of least squares we shall find the value 8 x such that the sum of 
weighted residuals e T We is a minimum. The value of Sx that minimizes e T We 
is given by the solution to 


~ (e T We) =■—■ [ (Sy) T WSy-(8y) T WH($x) - (8x) T H T W (Sy) + 
(Sx) T (H t WH) (Sx)] = 0 


= 0-H T WSy- H t W Sy + 2H T WHSx = 0 
= 2 H T W H S x - 2 H t W Sy - 0 

The weighted least squares estimate of Sx (t 0 ) is therefore, 

Sx (t Q ) = (H t WH) _ 1 H t W Sy 

which is the same as the maximum -likelihood estimate under normal theory. 
To examine Sx (t Q ) for unbiasedness, we proceed as follows: 

E [Sx] = E [(H t WH) _ 1 H t W Sy] = (H T WH)' 1 H T WE [Sy] 

= (H t WH) - 1 H t WE [HSx +e] = (H T WH) -1 (H T WH) Sx = I Sx =.8x . 

So Sx is an unbiased estimate of Sx. 
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The covariance matrix of Sx is 


cov (8x) = E [(Sx - Sx) (Sx - Sx) T ] 

= E [(H T WH) _1 H T WSy-Sx)((H T WH)" 1 H T WSy-Sx) T ] . 

If we substitute HSx + e for Sy we get 

cov (Sx) = E [{(H t WH) _ 1 H t W (H Sx + e) - Sx} 

{(H t WH) _ 1 H t W (H Sx + e) - 8x) T ] 

= E [{(H t WH) _1 (H t WH) Sx + (H t WH) - 1 H t We - Sx.} 

{ (H t WH) _1 (H t WH) Sx + (H t WH) - 1 H T We - Sx}] 

= E [{(H t WH) -1 H t W e } {(H t WH) _ 1 H T We}] T 
= E [ (H t WH) - 1 H T We e T WH(H T WH) _1 ] 

= (H t WH) - 1 H t W E [e e T ] WH(H T WH) -1 , E [e e T ] = a 2 W 

= (H t WH) _ 1 H T Wcr 2 (W -1 W) H(H t WH) - 1 
cov (sx(t 0 )) =0-2 (H t WH) -1 . 

In order to get an unbiased estimate <r 2 of c 2 we proceed with 
e T We = (Sx) T [H t WH Sx-H T WSy] - (H T W8y) T §x + (Sy) T W Sy 
= (Sy) T W Sy - (H t W Sy) T Sx 

Since, by virtue of the normal equations, (H T WH) Sx - H T W Sy = 0. Hence 
replacing Sx by (H T WH) _1 H T W Sy gives 


e T We = (8 y) T W Sy - (H T W Sy) T (H T W H)" 1 H T W Sy 


= (5 y ) T W Sy - (Sy) T W H (H T W H) _1 H T W Sy 
= (Sy) T [I - W H (H T W H) _1 H T ] W Sy. 

Now let E [8y] be denoted by rj . If the Sy's are all free of error, i.e., if 
Sy = -q = E [Sy], it is clear from 

0 = E [e] = E [Sy - HSx] = E[Sy] - HE [Sx] =Sy-HSx = e 

that e = 0. Consequently 

e T We = rj r [I-WH(H t WH ) _1 H t ] Wtj = 0. 

Hence 

E [e T W e] = E [ (S y ) T {l-WH(H T WH) _1 H T } WSy + 7? T {l-WH(H T WH)' 1 H T }Wr ) ] 
= E [ (Sy - T7 ) T (I-WH(H t W H)~ 1 H T ) V(Sy--rj)] . 

We shall now invoke a theorem on the expected value of a quadratic form. 

Let 

Y = S y - 7] , E [ 8 y - 7} ] □ 0, E [Y Y T ] = a 2 W " 1 . 

Let 

(bi^ = I - WHCH^H)' 1 H t , W = diag (W x , W 2 , •••, W B ). 

Let 

A = [I - W H (H T W H)“ 1 H t ] W. 
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E [Y T AY] = E 




Li = x 


= b„ W, E ty»] + b„ w 2 E [yf ] * • • ■ + b„ W„ E [y 2 ] 


= b„ (*, "•?) + b J2 (W 2 o-|) + ••• + b„ (W.o-J] 


a 2 b ii <*i = cr2/a ?; > = 1. 2. — . n) 

i=l 

o ' 2 tr [I - WH (H t WH )" 1 H t ] . 


Therefore, 

E [e T We] = o - 2 tr [l n - W H (H T W H)~ 1 H T ] 

= cr 2 [tr I n - tr WH(H t WH ) _1 H t ] (tr A + B = tr A + tr B) 

= cr 2 [n-tr (H t WH) (H T WH)*’ p ] (tr ABC = tr CAB) 


-a 2 [n - tr I ] 
P 


-a 2 (n - p) 


since I n and I p are unit matrices of orders n and p respectively. In a weighted 
least squares fit to the data an unbiased estimate cr 2 of o- 2 is given by 


~ 0 
a 4 


e T We 
n - p 


since 


E [& 2 ] = — i— E [ e T W e ] = — L_ a 2 (n - p) = cr 2 . 


Given the linear relation between observations and the required orbit 
information 


8y (t) = H(t) 8 x (t 0 ) + e, 

the following estimates obtain (a) unweighted least squares 

8x (t Q ) = (H t H)" 1 H t Sy 


(b) weighted least squares 


Sx (t Q ) = (H t WH) _1 H t W Sy 
(c) generalized least squares (correlated observations) 


Sx(t 0 ) = (H t Q- 1 H)' 1 H t Q- 1 Sy 


In the case of correlated observations it is necessary to calculate the inverse 
of the covariance matrix Q and this is especially difficult when the Q matrix is 
ill conditioned. 


BAYES ESTIMATORS 

Bayes methods is a name given to statistical methods that introduce dis- 
tributions of parameters at some stage of their development. Bayes procedures 
are based on information, experiences, and hunches of the individual. Their 
motivation is that, although the state of nature may not be known, it is unusual 
when one does not have some information about the state of nature, which could 
and should be used to assist in a decision. The Bayes principle assigns to each 
possible action a measure of the consequences of that action, the Bayes solution 
then being to take the action with the minimum Bayes measure. This measure 
is taken to be the average of the losses for a given action with respect to the 
prior distribution on the state. 

A decision rule or strategy that is to lead to a decision based on the obser- 
vations y must take into account that y can have many values. A rule for choos- 
ing an action is not then complete until it prescribes an action a for each con- 
ceivable value of y . Such a rule is called a statistical decision function: a = d (y). 
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This is a mapping from the space of possible observation points to the space 
of actions. The action taken is random, since the data to which the decision rule 
is applied are random. Hence, the loss is random: •£ (9, d(y)). It is customary 
to base the analysis on the expected value of this loss: 

R (9, d) = E d [1(9, d (y)] . 


The average is computed with respect to the distribution of the data y . This 
distribution of y depends on 6 , so the dependence of the function R (9, d) on 9 
enters through the 9 in ^ (9, a) and also through the 9 in the distribution of y . 
The expected loss R (9, d) is called the risk function. In many situations, 6, is 
or may be regarded as a random variable and we may have an a priori knowledge 
of its probability density function g(0). Assuming continuous distributions, we 
find the density of the posterior distribution h(6 l | y) as follows: 


h<a|y) = g °y < yk) 

f (y) 

where g (9) is the prior density, f (y\9) is the conditional probability density of 
the data y given that the parameters have the value 9, and f (y) is the absolute 
density of y defined by 

f (y) = J f (y I 9 ) g (9 ) d<9 = E g [ f (y | 9 ) ] . 

The posterior distribution can be used as a tool in computing Bayes strategies. 
The method is to compute the posterior expected loss: 

E n [i (9, a)] = f ■i (9, a) h (9 | y) d<9 

and then to determine a that minimizes this quantity. The dependence of the 
chosen a on the given y provides a decision function a = d*(y), which is pre- 
cisely the Bayes decision function for the given prior distribution. To establish 
this claim, we manipulate the expression for the Bayes risk corresponding to a 
decision function d (y): 

B (d) = E g [R (9, d)] = Jr (9, d) g (9) 69 

= JJ l (9, d (y>) f (y i 9) g (9) dy 69 
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r 


or, if the order of integration is interchanged, as 

B (d) = J [/ i (&, d (y)) h (9 | y) d<9] f (y) dy 

= | E n [i ( 6 , d (y)] f (y) dy . 

Now the inner integral is the second moment about the point a = d (y). Since 
the second moment of a variable is a minimum when it is taken about the mean 
of the variable, it follows that this integral is minimized for each value of y if 
d is chosen as the mean of the conditional distribution of 6 for y fixed. From 
the last line of B(d) we see that the integral is minimized by a function d(y) whose 
value for each y is selected to minimize the expected posterior loss, E n [-£(#, a)]. 

In summary, the Bayes estimator of 9 is a function d (y) = E[0|y]. The 
problem of finding a Bayes solution is now seen to reduce to the problem of 
finding the conditional expected value of the parameter 6 when the variable y 
is held fixed. This solution is quite general because y may be chosen to be a 
statistic, such as a maximum likelihood estimate, from which an estimate a = d(y) 
is to be constructed, or it may be a vector variable. Consequently, in estimating 
the mean of a normal distribution, y might well represent a vector variable. 

Like the maximum likelihood estimators, it can be shown under quite general 
conditions that the Bayes estimator is consistent, efficient, and sufficient. In 
addition it can be shown that the Bayes estimator differs from the maximum 
likelihood estimator by an amount which is small compared with 1/ /n, where n 
is the sample size. 


STATISTICAL FILTER THEORY ESTIMATORS (KALMAN-SCHMIDT) 

In reference [2] R. E. Kalman presented a new approach to linear filtering 
and prediction problems. He introduced an alternative approach to a linear 
dynamic system (Wiener filter) which accomplishes the prediction of a random 
signal. In this paper is found the formulation and solution of the Wiener problem 
using state and state transition concepts from control theory. 

R. E. Kalman developed optimal estimates using the idea of orthogonal 
projection in a multidimensional vector space (linear manifold). A careful 
analysis, reference [9] , reveals that the Bayes estimation and the filter theory 
approach are basically the same. The two approaches differ in (1) the manner 
in which the linearization of a basically non-linear process is performed and in 
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assuming Gaussian distributions, and (2) in the type of equations used in estima- 
tion. In Bayes estimation the equations are expressed in a manner whereby the 
estimate is obtained by operating on the entire set of observations at a time. In 
the filter theory approach the procedure for estimation involves taking the obser- 
vations one at a time in the order of occurrence. It has been adequately shown 
in reference [9] that for the case of uncorrelated observations the methods are 
equivalent and the Bayes estimate is identical to the estimate obtained by the 
filter theory approach. 

There are some fundamental differences between the classical least-squares 
method and the Kalman-Schmidt technique. First the least squares estimate is 
based on the minimization of the sum of squares of weighted residuals and the 
Kalman-Schmidt technique minimizes the expected value of a certain risk function. 
This will be explained in detail in the derivation of the Kalman-Schmidt technique. 
Secondly, the data is processed serially instead of in parallel as in least squares. 

The tracking data is processed serially and the nominal orbit is updated as 
follows: A priori estimates of the state vector and its associated error exist 
prior to the actual processing of tracking data and before an improved estimate 
of the orbit is obtained. Both a priori estimates are updated to an instant of time 
when tracking data becomes available. Revised estimates of the nominal trajec- 
tory and the associated covariance matrix are obtained based upon the tracking 
data available for this time. These revised estimates remain until additional 
data becomes available and then the estimates are again updated to the time of 
the new observation. This process continues until all observations have been 
used. 

There are computational advantages which exist when the Kalman-Schmidt 
technique is employed but do not exist in the method of least squares. 

The Kalman-Schmidt method requires a fewer number of iterations than the 
method of least squares. This is so because the linear assumptions required for 
the updating theory are violated to a lesser degree than in the method of least 
squares where the estimate of the state vector is based on an entire sequence of 
observational residuals spread over an extended time arc. Another advantage of 
the Kalman-Schmidt method is that it avoids the inversion of large matrices. In 
least squares the order of the matrix to be inverted is equal to the number of 
quantities to be estimated while in the Kalman-Schmidt method the order of the 
matrix to be inverted is equal to the number of different types of tracking data 
available at a particular instant of time. The Kalman-Schmidt technique processes 
the tracking data sequentially and updates the orbit continuously. In the least 
squares method the data is processed in batches and when additional data is re- 
ceived the normal equations are again solved to obtain a new estimate of the state 
vector. 
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DERIVATION OF THE KALMAN-SCHMIDT METHOD 


Let the nonlinear equations of motion in rectangular coordinates be given 

by 

X = g x (X, y, z, t) 

y = g 2 (x, y, z. t) ^ 

Z = g 3 (x, y, z, t) . 

It is desirable to obtain linear differential equations that represent pertur- 

bations of the actual trajectory from the reference. The usual assumption is made 
that the perturbations to the coordinates are small and that equations (1) can be 
expanded in a Taylor series retaining only terms of first order. For example, 
the equation for x, when expanded about the reference position x R , y R , z R , 
becomes 


/ 3g A 

* - Si ( X R’ ^R’ Z R> + bT - 

V 8 /t 


(x-x R ) 


t R 


+ 



(y-y R ) + 

t R 



(z-z R ). 


(2) 


The equations for y and z are written similarly. The partial derivatives 
are evaluated at the reference position and it is to be understood that the refer- 
ence quantities x R , y R , z R , are varying functions of time obtained from a refer- 
ence trajectory. If the reference is the estimated trajectory, then the partial 
derivatives are evaluated each time a new estimate of the state vector is obtained. 
Define a deviation state vector 8 x( t) to be the vector of deviations of position and 
velocity from reference 
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y(t) - y R (t) 


Sx 2 (t) 


y(t) - y R (t) 

Sx 3 (t) 


z(t) - z R (t) 

§ x 4 (t) 


x(t) - x R (t) = axj(t> 

Sx 5 (t) 


y(t) - y R (t) = Sx 2 (t) 

_ Sx 6 ( t) _ 


z(t) - z R (t) = Sx 3 (t)_ 


With the above definition equations (2) take the form 



It is desirable to reduce the set of equations (4) to a set of first order 
equations 



Sx s = 


ai 6 = 


B g 2 

3 g 

S x, + 

8 x 0 

B g 2 

Bx r 

1 3 Vr 

2 

+ Bz r 

^3 

s Bg3 

OX. + 

8 x 0 

B g 3 

Bx r 

1 3 Vr 

2 

+ Bz r 


8 x. 


( 5 ) 


It is convenient to consider the linear system of perturbation equations (5) 
in matrix notation as 


Sx = F(t) Sx(t) 


( 6 ) 


where Sx (t) is a 6 vector and F(t) is a continuous matrix function of the argu- 
ment t in some interval (a, b) 


where 


F(t ) = 


0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

hi 

Bx r 

hi 

B y R 

B Si 

Bz r 

0 

0 

0 

B g 2 

Bx r 

hi 

B y R 

B g 2 

Bz r 

0 

0 

0 

B e 3 

^3 

hi 

0 

0 

A 

Bx r 

B y R 

Bz r 

u 


G(t ) = 


0 

3 X 3 


G(t ) 

3 X 3 


6X6 


hi 

hi 

hi 

Bx r 

B y R 

Bz r 

111 

B g 2 

B g 2 

Bx r 

B y R 

Bz r 

^3 

hi 

B g 3 

Bx r 

B y R 

Bz r 


I 

3 X 3 

0 

3 X 3 


( 7 ) 


( 8 ) 


— I 3x 3 
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We consider the system of differential equations (6) 


= s if = (Sx) = F(t) Sx(t) (9) 

where F (t) is a continuous matrix function of the argument t in some interval 
(a, b). An integral matrix of the system (9) shall be defined as a square matrix 
[ U ( t) ] 6x6 whose columns are 6 linearly independent solutions of the system. 
Since every column of U(t) satisfies (9), the integral matrix U (t) satisfies the 
equation 


TT" = F(t) U . (10) 

d t 

From the theorem on the existence and uniqueness of the solution of a system 
of differential equations it follows that the integral matrix U (t) is uniquely deter- 
mined when the value of the matrix for some initial value t = t Q is known, 

U(t 0 ) = U 0 . We can take an arbitrary nonsingular square matrix of order n for 
the matrix U 0 . In particular if U (t 0 ) = I, the integral matrix U (t) will be called 
normalized. Let us define (F) or simply as the normalized solution of 
(10) and call this solution the matricant or state transition matrix. 

(F) can be obtained one column at a time by solving (9) 6 times each with 
a different member of Sx(t 0 ) set equal to unity and all the other elements set 
equal to zero. After (F) has been obtained the solution of (9) for Sx(t) is 
given by 


Sx(t) =n‘ t0 (F) Sx(t 0 ) (11) 

where Sx(t 0 ) is the given set of initial conditions. The transition in the states of 
the system of equations from time t Q to time t is given by (11) and hence relates 
deviations from the reference trajectory at time t to the initial deviations at time 


The equations that relate the observables to the state variables are also 
linearized. Let y be the n vector of observations. Let y (x(t); t) be the esti- 
mate of the measured data based on the current estimate of the trajectory. The 
first order Taylor expansion about a reference trajectory then gives 

Sy(t') = y(t) - y (t) = H(t) 8x(t) + e (12) 
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where H(t) is the matrix given by equation (4) in the section called "conventional 
differential orbit correction." 

It is required to find the optimum estimate 8x (t) from equation (12). It 
will be assumed that there exists a loss function which measures the loss in- 
curred if incorrect estimates are made. Clearly the loss is a positive and non- 
decreasing function of the estimation error. It will also be assumed that the 
optimal estimate is a linear estimate, i.e., a linear function of the observations. 
Results obtainable by linear estimation can be improved by non-linear estimation 
only when the random processes are non-Gaussian and only then by considering 
at least third order probability distribution functions, reference [2] . 

The assumptions that were made about the optimal estimate permit one to 
regard this optimal estimator as a linear filter where the input signal consists 
of an observation corrupted by Gaussian noise and the output is the estimate of 
the state at present time. The optimal properties of the filter depend upon a 
certain weighting matrix K(t). The observation errors, e (t), are represented 
as the output of a linear dynamic system excited by white noise, a standard 
engineering trick, valid when only second-order statistics are concerned. The 
injection conditions are regarded as a vector valued random variable with zero 
mean. The matrix P is defined as being the covariance matrix of deviations 
between the actual and reference trajectories. For example at injection, the 
matrix P is the covariance matrix of injection errors and is assumed to be 
known. The covariance matrix P at any time t is defined as 


P(t) = E [Sx(t) 8x T (t)] (13) 

where Sx (t) is the deviation between the actual and reference trajectories. If 
8x (t 0 ) is known then 

Sx(tj) = n‘1 Sx(t 0 ). (14) 

By substitution of (14) into (13) 

P(t 1 )=E[8x(t 1 )8x T (t 1 )] =E[n*i Sx(t 0 ) 8x T (t 0 ) 

= E D^o) 8 xT ( t o)] (°^) T 

= p (t 0 ) (^;) T - 
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In general 


p(V = n !j., p <t».i) ( q !L,) T <15 > 

The covariance matrix of the observations as derived from (12) is 
Y(t) = E [Sy(t) Sy T (t)] = E [(H(t) Sx(t) + e) (H(t) Sx(t) + e) T ] 

= E [ (H(t) Sx (t) + e) (Sx T (t) H T (t) + e T )] 

= H(t) E [Sx(t) Sx T (t)] H T (t) + E [e e T ] 

T (16) 
- H(t ) fi P(t) H T (t) + Q. 

0 

If the stochastic processes employed are assumed to be Gaussian then 
orthogonal projection as mentioned earlier is actually identical with conditional 
expectation. The approach to be taken here is one of applying concepts from 
statistical decision theory as explained earlier in the section called "Bayes 
estimators." Assuming continuous distributions, the density of the posterior 
distribution h (Sx | Sy) is given by 


h(Sx | Sy) 


f (Sy | Sx) g(Sx) 
f (Sy) 


(17) 


where g(Sx) is the prior density, f (Sy| Sx) is the conditional distribution of Sy 
given a certain value of Sx, and f (Sy) is the absolute density of Sy defined by 


f (Sy) = f f (Sy | Sx) g(8x) d($x) = E g [f (Sy) | Sx] . 

The method is to compute the expected posterior loss 

E h ['t(Sx, Sx)] = j ^(Sx, Sx)h(Sx|Sy) d(8x) (18) 

" —CO 

and then to determine Sx which minimizes this quantity. Assume the loss func- 
tion to be 

^(Sx, Sx) = Sx T Sx. (19) 
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Then, 


E [-t (Sx, Sx)] =1 Sx T Sx h(Sx| Sy) d(Sx) (20) 

—CD 

Since the second moment of a variable is a minimum when it is taken about 
the mean of the variable, it follows that this integral is minimized for each value 
of Sy, if Sx is chosen as the mean of the conditional distribution of Sx for Sy 
fixed. Hence the conditional mean is the optimal estimate, or 

Sx = E [ Sx | Sy ] . 


Let Z = ( g) be an augmented state vector where e is the noise vector in 
Sy(t) = H (t) Sx(t) + e. Let it be assumed that the state vector Z belongs to a 
normal distribution in order to simplify the formulas for updating the best esti- 
mate Sx. This assumption is not at all necessary since the results to be obtained 
hold under much more general conditions. 

The problem of updating the estimate Sx is divided into two parts: the first 
is to update Sx when at time t new data y (t) becomes available. The second is 
to update Sx from t to t + At. 

The covariance matrix of Z - Z is assumed to be of the form. 


E [(Z-Z) (Z-Z) T ] = E 



= E 


x-x 

e-e 


(x~x) T (e-e) T 


(x-x) (x— x) T , (x-x) (e-e) T ~ 


1 

o 

_(e-e) (x-x) T (e-e) (e-e) T _ 


”c> 

o 


and since e is assumed to be independent of x, P is the covariance matrix of Sx, 
and Q is the covariance matrix of e . 

We can write 


y = MZ = (H| I) 



= Hx + e . 
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It is assumed that E [e] =0. Hence 


E [Z] 




We wish to obtain an improved value of the current estimate > 
information y is available. 

The probability density function P (Z) is 

P( Z ) = - e ~ 1/2 (Z ~~ Z)T P z X (Z ~ Z) 

| P z ! 1/2 (27T) n/2 

where n is the number of components of Z. Since 

y-y = MZ-MZ = M(Z-Z), 

E [(y-y) (y-y) 7 ] = e[m(Z-Z) (M(Z-Z)) t ] =E[M(Z-Z) (Z 
= M E [(Z-Z) (Z-Z) T ] M T = M P z M t 


so, 


P(y) = 


|mp z m t 


1/2 


,*1/2 (y 


y) T 


(MP. 


T - 
M 1 ) 


(y-y) 


( 277 ) 


m/2 


L_ e -1/2 ( z_2 ) T “ T < mp z mT ) -1 m < 

MP Z M t | 1/2 (277) 1 "/ 2 


where m is the number of components of y. 


when new 


-Z) t M t ] 
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Suppose we have two random vectors y and Z that have a multivariate 
normal distribution with means and ij. 2 respectively. Suppose we want to 
estimate the expected value of Z for a given y. This suggests the use of the 
conditional distribution of the multivariate normal. To obtain an analytical ex- 
pression for P(Zjy), Bayes theorem is employed: 

P(Z | y) = Hi L ? >- P l z ) 

P(y) 


m-n 

= Qnl L |MP_ M t | 1/2 e~ 1/2 

Ip 1 1 /2 

1 z 1 


[(z-zyp’ 1 (Z-Z) -(Z-Z) t M t (MP z M t ) _1 M (Z-Z)] 


The optimal estimate is the mean value of this density function, i.e., E (z|y). 
The mean value is obtained when the exponent is minimized. Hence, 

^ [(Z-Z) T P z 1 (Z-Z) - (Z-Z) T M T (MP Z M T )" 1 M(Z-Z)] = 0 

or 

2P' 1 (Z-Z) - 2 M t (MP z M t ) _1 M(Z-Z) = 0 
P z [p- 1 - M t (MP z M T r 1 M] (Z-Z) = 0 
I (Z-Z) - P z M T (MP Z M T ) -1 M(Z-Z) = 0 
Z-Z = p z M T (MP Z M T ) _1 (y-y) 

Z = Z + p z M t (MP z M t ) _1 (y-y) = Z + K(t) (y-y). 

m 

Since 

y = MZ and y = MZ 
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p z m t = 

M P z M t = (H | I) j = HPH t + Q, 

the new estimate Z n is given by 

/ P H t \ 

z n = z p + ( J (HPH T +Q)* 1 (y-y) 

where Z p represents the estimate prior to the time new data is used. Now 

x n - X p = PH t (HPH t + Q)' 1 (y-y). 

Premultiplication of this equation by H results in 

H Ax = HPH t (HPH t + Q) -1 Ay 

where 

Ax = x n - x p , Ay = y - y . 

This equation shows the weighting effect of Q. For example, if Q = 0 then 
H Ax =(HPH t )(HPH t )- 1 Ay = I Ay = Ay. 

Let the weighting matrix K = P z M T (MP Z M T ) _1 . Then Z n = Z p + K(y-y) 
and the new covariance matxix is 

E < z .-z„KZ„-z„) r = *(z.-4 p -K(y-y)) (z„-z p -K (y -y)) T 
= E ((I-KM) (Z„-Z p )) ((X-KM) (Z„-Z p )) T 


= (I-KM) E(Z n -Z p ) (Z n -Z p ) T (I-KM) t = (I-KM) P z (I-KM) T 
= (P Z -KMP Z ) (I-(K M) t ) 

= P z - KMP Z - P z (KM) t + KMP Z (KM) t 
= P z - KMP Z - P z M t K t + P z M t (MP z M t )- 1 (MP z M t ) K t 
= P z - KMP Z - P z M t K t + P z M t K t 
= (I-KM) P z . 


Hence the new estimate of P z is 

Pznew^a-KM) P Z old- 
I - KM - I -P z M t (MP z M t )- 1 M 
/ PH t \ 

= I -( j (MP Z M T )“ 1 (H| I). 

If the y vector is of dimension one or the noise components are independent 
of one another and y can be treated one component at a time, then HPH T + Q = 
MP z M t is a scalar. 

In this case 


I - 



(MPjrM 7 )- 1 (H| I) = I - 


HPH t + Q 


/PH t H 
\ QH 



and 


. 1 /PH T (y-y)N 

n ." p + HPH t + Q V Q(y-y) ) 


y = MZ = Hx since e = 0 
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'PH 1 y - PH T H x 


HPH t + Q \ Qy - QHx 


If the noise is time independent or the components are independent then 
when the P z matrix is updated n time only the P part remains. Hence, the only 
part of the P z matrix that need be kept is the lower right hand corner. For the 
case where y is a scalar 



x /ph t h ph t \ 


~P ol 

P 7 = (I -KM) P 7 . . = 

Z new v 7 Z old 

HPH t + Q \ QH Q /_ 


I 

O 

0 

1 


rp o i 

new 


"p 

0 

_ 1 

/PH t HP 

PH t Q\ 

0 Q 

_ v new J 


0 

Q _ 

HPH t + Q 

\ QHP 

Q 2 j 


Hence , 


P 


new 


ph t hp 

HPH t + Q 


Q- 


HPH t + Q 


The Matricant 


We consider a system of differential equations 



where F (t) = [F ik ] nxn is a continuous matrix function of the argument t in 
some interval (a, b). We use the method of successive approximations to deter- 
mine a normalized solution of (1), i.e., a solution that for t = t Q becomes the 
unit matrix (t Q is a fixed number of the interval (a, b)j. The successive approx- 
imations x k (k = 0, 1, 2,...) are found from the recurrence relations 


dx k 

dt 


= F(t) x k j (k = 1, 2, . . 


when x Q is taken to be the unit matrix I , 


Setting x k ( t Q ) = I ( k = 0, 1, 2, . . . ) we may represent x k in the form 


x k = M* 


o>+f F(t) x k-i dr - 
Jt o 


Thus, 

x 0 = I, X x = I + f F(t) dr, x 2 = I + f F (V) x t dr = 
Jt o 


ft 

r 

F(r) 

I + I F(a) da 

J t 0 

L J t 0 J 


dr 


or 


x 2 = I + f F(r) dr + f F(r) f 


FCcr) da dT, . . . 


i.e., x (k = 0, 1, 2, . . . ) is the sum of the first k + 1 terms of the matrix series 


r 


I + | F (t) dr + 


r r 


F(a) da dr, 


( 2 ) 


The series (2) is absolutely and uniformly convergent in every closed sub- 
interval of the interval (a, b) and determines the required solution of (1). This 
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solution will be denoted by fi* (F) or simply fl* and is often called the matri- 
cant. The representation of the matricant in the *iorm of such a series was first 
obtained by Peano in 1888. 

Properties of the Matricant 

1. n\ =n\ a . 1 (t. , t., t € (a, h)) 

l 0 *1 l 0 0 l 

2 . n* (F + G) = ft! (F) n* (G)with G=[fi? (F)] _1 Gfi‘ (F) 

* o *0 *o *o t o 

3. in |fl‘ (F) | = / trFdr 

0 *o 

We shall now show how to express by means of the matricant the general 
solution of a system of linear differential equations with right hand sides: 


dx. 

i 

77 


n 

L 

k= 1 




+ F. (t) 


F. k (t) and (t) (i, k = 1, 2, . . . , n ) are continuous functions of t in some 
interval. 

By introducing the column matrices x = (x 1# x 2 , . . . , x n ) and F = ( f 1? f 2 , 
. . . , f ) and the square matrix F, we write the system as follows: 


— = F(t) x + f(t). (3) 

dt 

We shall look for a solution of this equation in the form 

x = n* (F) z. 
t 0 

where z is an unknown column depending on t . We substitute this expression 
for x in (3) and obtain: 


F n\ (F) Z.+ Q\ (F) — = Ff) t l (F) z + f (t) ; 
o l o dt o 
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hence 

— = [fi*: CF)]" 1 f (t). 

dt o 

Integrating this, we find: 


z = [fl T t (F) ] _1 f(r) dr + c, 

*' t o 

where c is an arbitrary constant vector. Substituting this expression in (3), we 
obtain: 


x 



[ftT (F) ] -1 Hr) dr + ft* (F) c 
t o o 


(4) 


When we give to t the value t 0 , we find: x( t 0 ) = c . Therefore (4) assumes the 
form 


where 


x = ft* (F) x(t ) + 
r o u 


f K( t, r) f(r) dr 
Jt 0 


(5) 


K(t, r) = ft* (F) [ft!' (F)] -1 (Cauchy matrix). 
0 *0 


Let us consider the matricant ft* (F). We divide the basic interval (t Q , t) 
into n parts by introducing intermediate points , t 2 , . . . , t n _j and set 
At fe = t k - t k _ (k = 1, 2, . . . , n ; t = t ). Then by property 1 of the matricant 


( V n '' , 


• n ', 2 

l *o 


The multiplicative integral first introduced by Volterra in 1887 is given by 


(F) 

t o 



(I + Fdt) = lim [I + F(r ) At ] . . . [I + F(r )At ] . 
At k -0 
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We now introduce the multiplicative derivative 

D x = — X -1 , where x = Q* (F) . 

1 dt ‘o 

Updat ing the Estimate x (t) in Time, i.e., from t to t + At. 

The linear perturbation equation appears in the form 

x(t) = F(t) x(t) + f(t) 

where x (t) is the state vector and F ( t) is the perturbation matrix. All solutions 
of this linear differential equation can be written in the form 


x(t) = n* x(t ) 

where x(t 0 ) is a vector of initial conditions at time t Q , Q* is the state transition 
matrix. 

The solution of the differential equation is updated from t Q to t by (5), i.e., 


x(t ) = Cl\ (F) x(t 0 ) + f K(t, t) f (t) dr 
° 


where 


K(t, t) = nj (F) [nr (F)]- 1 . 

t o o 
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